LGM: Mining Frequent Subgraphs from Linear Graphs
نویسندگان
چکیده
A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient enumeration of frequent subgraphs. Based on the reverse search principle, the pattern space is systematically traversed without expensive duplication checking. Disconnected subgraph patterns are particularly important in linear graphs due to their sequential nature. Unlike conventional graph mining algorithms detecting connected patterns only, LGM can detect disconnected patterns as well. The utility and efficiency of LGM are demonstrated in experiments on protein contact maps.
منابع مشابه
FS3: A sampling based method for top-k frequent subgraph mining
Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we pr...
متن کاملA Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining
Mining frequent subgraphs from a collection of input graphs is an important task for exploratory data analysis on graph data. However, if the input graphs contain sensitive information, releasing discovered frequent subgraphs may pose considerable threats to individual privacy. In this paper, we study the problem of frequent subgraph mining (FSM) under the rigorous differential privacy model. W...
متن کاملFeature Selection in Frequent Subgraphs Feature Selektion auf häufigen Subgraphen
Bioinformatics is producing a wealth of network data, ranging from molecular graphs to complex gene expression networks. To distinguish different classes of graphs, such as different functional classes of proteins, one common approach is to search for common frequent subgraphs. However, this method suffers from the fact that it quickly generates thousands or even millions of frequent subgraphs....
متن کاملMining frequent subgraphs from ’easy’ classes
Recently, there is an increasing interest in mining structured data. Several frequent subgraph mining systems have been proposed. However, these usually consider general graphs. One can show that frequent subgraph mining for general graphs can not be performed in output-polynomial time. In practice however, data usually does not consist of arbitrary graphs but has a much simpler structure. In t...
متن کاملCombining near-optimal feature selection with gSpan
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with c...
متن کامل